Link Prediction in Very Large Directed Graphs: Exploiting Hierarchical Properties in Parallel
نویسندگان
چکیده
Link prediction is a link mining task that tries to find new edges within a given graph. Among the targets of link prediction there is large directed graphs, which are frequent structures nowadays. The typical sparsity of large graphs demands of high precision predictions in order to obtain usable results. However, the size of those graphs only permits the execution of scalable algorithms. As a trade-off between those two problems we recently proposed a link prediction algorithm for directed graphs that exploits hierarchical properties. The algorithm can be classified as a local score, which entails scalability. Unlike the rest of local scores, our proposal assumes the existence of an underlying model for the data which allows it to produce predictions with a higher precision. We test the validity of its hierarchical assumptions on two clearly hierarchical data sets, one of them based on RDF. Then we test it on a non-hierarchical data set based on Wikipedia to demonstrate its broad applicability. Given the computational complexity of link prediction in very large graphs we also introduce some general recommendations useful to make of link prediction an efficiently parallelized problem.
منابع مشابه
Hierarchical Hyperlink Prediction for the WWW
The hyperlink prediction task, that of proposing new links between webpages, can be used to improve search engines, expand the visibility of web pages, and increase the connectivity and navigability of the web. Hyperlink prediction is typically performed on webgraphs composed by thousands or millions of vertices, where on average each webpage contains less than fifty links. Algorithms processin...
متن کاملParallel Algorithms for Force Directed Scheduling of Flattened and Hierarchical Signal Flow Graphs
In this paper, we present some novel algorithms for scheduling hierarchical signal flow graphs in the domain of high-level synthesis. With complex chips that need to be designed in the future, it is expected that the runtimes of these scheduling algorithms will be quite large. There are several key contributions of this paper. First, we develop a novel extension of the force-directed scheduling...
متن کاملEvaluating Link Prediction on Large Graphs
Exploiting network data (i.e., graphs) is a rather particular case of data mining. The size and relevance of network domains justifies research on graph mining, but also brings forth severe complications. Computational aspects like scalability and parallelism have to be reevaluated, and well as certain aspects of the data mining process. One of those are the methodologies used to evaluate graph...
متن کاملProviding a Link Prediction Model based on Structural and Homophily Similarity in Social Networks
In recent years, with the growing number of online social networks, these networks have become one of the best markets for advertising and commerce, so studying these networks is very important. Most online social networks are growing and changing with new communications (new edges). Forecasting new edges in online social networks can give us a better understanding of the growth of these networ...
متن کاملLink Prediction via Matrix Factorization
We propose to solve the link prediction problem in graphs using a supervised matrix factorization approach. The model learns latent features from the topological structure of a (possibly directed) graph, and is shown to make better predictions than popular unsupervised scores. We show how these latent features may be combined with optional explicit features for nodes or edges, which yields bett...
متن کامل